WHEW! Okay, time for some plots. A much more rewarding endeavor. Plot making! Make sure that you have installed the ggplot2 package.
Here is a brief description of the basic building blocks of a creating a ggplot.
| argument | description of component |
|---|---|
| data | as a data.frame (long format!) |
| aesthetic (aes) | mapping variables to visualise properties - position,colour, line, type, size |
| geom | actual visualisation of the data |
| scale | map values to the aesthetics, colour, size, shape (show up as legends and axes) |
| stat | statistical transformations, summaries of data (e.g., line fits, etc., ) |
| facet | splitting data across panels based on different subsets of the data |
Let’s start with a basic scatterplot of life expectancy over time. You’ll notice that we are telling ggplot that we will be using the gapminder data (a data.frame!) and then telling it that we want the year on the x-axis and life expectancy on the y. After that, we need to use the + to indicate that we want to add another layer - in this case we need to add points.
# Load ggplot2
library(ggplot2)
# Load gapminder
library(gapminder)
## Warning: package 'gapminder' was built under R version 3.1.3
# Basic scatterplot
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_point()
Now let’s add some colour. Yes, 'colour' or 'color' can be used in the ggplot functions.
# We're going to colour the discrete variable continent
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
geom_point()
Just as you can assign vectors, data.frames, and other R objects to a variable, you can also assign ggplots to variables.
p <-
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
geom_point()
And as we’ve seen before, no plot has been produced because it has been stored as the variable p. To view our plot, we can just call that variable.
p
Here’s where we’re going to demonstrate the way that you add layers to build up a plot.
See here, that when you just call ggplot, without any geoms, nothing gets plotted! You need to also tell it to add something!
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent))
In this case, let’s add a line!
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_line()
What if we want to add more than a just a line? No problem, let ggplot know that you are going to add something else using the +. Let’s add some points.
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_line() +
geom_point()
Okay, that’s great, but the points don’t really stand out against the colour of the lines. We can also be more specific with our layering and aesthetics. Notice how I moved the aesthetics into the geom_line() function. You can think of aesthetics that are listed in the ggplot() function as being the 'global' settings, laying the defaults for any geoms to come later int he plot.
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) +
geom_line(aes(colour = continent)) +
geom_point()
Remember, scales are what will ultimately result your axes and variables that are coded using a legend.
What is something that you notice between these two graphs?
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = log(gdpPercap))) +
geom_point()
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_point()
If you haven’t noticed yet, let’s look back at our data.
str(gapminder)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : num 1952 1957 1962 1967 1972 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Cool! There is a difference in the way that the default colour scale operates for discrete and continuous variables!!!!!
Ok, now that we’ve figured that you, we can change the colour scales for our variables. With your neighbour see if you can change the colour scales that are being used. You’ll likely need to use the scales cheatsheet section and a little bit of googling. Let me know if you guys need a hint. But I want you to take try first.
Stats summarise data. Some examples include boxplots, model fits, density plots, and bars/bins. These are very similar to using
1) Visualising overlapping data
Let’s return to our boring basic scatter plot.
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_point()
Sometimes, when you have many data points that overlap, you want to have a better idea of the amount of data for a given point. I find this often comes up when you are plotting discrete variables with continous ones. There are two ways we can get a better look at the data. We can use jitter or adjust the transparency of the data.
# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_jitter()
We could also use geom_point(position = position_jitter()) instead.
I would prefer if this plot didn’t have the points jittered so much. I can do this by changing the width and height of the jitter.
# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_jitter(position = position_jitter(w = 0.5, h = 0.5))
Another option is to use transparency.
# Adjusting the transparency of points
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_point(alpha = 0.5)
2) Themes
# Continuous variable examples
ggplot(data = gapminder, aes(x = lifeExp, y = log(pop))) +
geom_point(aes(colour = continent)) +
facet_wrap(~ continent)
ggplot(data = gapminder, aes(x = lifeExp, y = log(pop), colour = log(gdpPercap))) +
geom_point() +
scale_colour_gradient(low = 'blue', high = 'red')
ggplot(data = subset(gapminder, year == 2007)) +
geom_bar(aes(x = country, fill = continent))
ggplot(data = gapminder, aes(x = lifeExp, y = gdpPercap)) +
geom_point() +
scale_y_log10()
ggplot(data = gapminder, aes(x = lifeExp, y = log(gdpPercap))) +
geom_point()
Take a look under the scales section. You’ll see that there
Let’s take a brief interlude and learn a new function. I would like you to use subset() to select data from the year 2007.
subset(gapminder, year == 2007)
subset(gapminder, year == 2007)
, aes(x = lifeExp, y = gdpPercap, colour = gdpPercap)) +
geom_point(aes(size = ))
Exercise 3
# Demo this one:
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth()
# Build this one:
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
stat_smooth(method = 'lm')
#### Other cool tips and tricks
Let's return to our boring basic scatter plot
ggplot(data = gapminder, aes(x = year, y = lifeExp)) + geom_point() ~
This isn’t necessarily the best example of when you would use these tricks, but it will do.
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_point(alpha = 0.4)
Another alternative when you have lots of data points on top of each other can be to use the geom_jitter()
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
geom_jitter()
The last thing we’ll talk about is facetting. This allows you to create panels of graphs
ggplot(data = gapminder, aes(x = year, y = lifeExp, color=continent)) +
geom_line() + facet_wrap( ~ country)
Work with your neighbour now to familiarise yourself with the different ways you can facet your variables - using the faceting section on the ggplot2 cheatsheet.
You may need to subset. Some subsets to work with are year or continent.
e.g.,
p + geom_smooth(colour = ‘black’) ~ Let’s add lines for the different continents - this will be a stat because we’re summarising our data.
ggplot(data = gapminder, aes(x = year, y = pop, colour = country)) +
geom_point()
ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) +
geom_point() +
scale_y_log10()
ggplot(data = gapminder, aes(x = year, y = log10(pop), colour = continent)) +
geom_point()
ggplot(data = gapminder, aes(x = year, y = log10(pop), colour = country)) +
geom_point() +
stat_smooth()
ggplot(data = gapminder, aes(x = year, y = log10(pop), colour = continent)) +
geom_point() +
stat_smooth()
Exercises:
ggplot(data = subset(gapminder, year %in% 1982:2007)) +
geom_boxplot(aes(continent, gdpPercap))
Let’s go back to the first line plot we made. There are faster ways of doing this, but for now, let’s use ggplot.
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_point()
What if we want to quickly see what the outliers are?
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_point() +
geom_text(aes(label = country))
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
geom_point() +
geom_text(aes(label = year))
ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) +
geom_point() +
scale_y_log10()
Homework:
ggplot(data = subset(gapminder, year %in% 1982:2007)) +
geom_boxplot(aes(continent, gdpPercap, colour = continent)) +
facet_wrap(~ year)
ggplot(data = subset(gapminder, year %in% 1997:2007)) +
geom_density()
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) +
geom_point() +
geom_line(aes(colour = continent))